[BugFix] Fix NOT IN including null/missing rows (#5165)#5337
Conversation
… filter (opensearch-project#5165) When PredicateAnalyzer handles a SEARCH expression with complemented points (NOT IN) and nullAs=UNKNOWN, the generated DSL query now includes an exists filter to exclude null/missing documents. This aligns with SQL three-valued logic where NULL NOT IN (...) evaluates to UNKNOWN, not TRUE. Signed-off-by: Songkan Tang <songkant@amazon.com>
Decision LogRoot Cause: Approach: Added Alternatives Rejected:
Pitfalls: The Things to Watch: Any future Sarg patterns with |
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Explore these optional code suggestions:
|
Description
When
PredicateAnalyzerhandles aSEARCHexpression with complemented points (NOT IN) andnullAs=UNKNOWN, the generated OpenSearch DSL query was missing anexistsfilter. This caused documents with null/missing field values to be incorrectly included inNOT INresults.Root cause: In the
case UNKNOWNbranch of the SEARCH handler (~line 728), the expression was returned as-is without adding anexistscheck for complemented points. SQL three-valued logic dictates thatNULL NOT IN (...)evaluates toUNKNOWN(notTRUE), so null rows must be excluded.Fix: Check
isSearchWithComplementedPoints(call)in theUNKNOWNbranch and wrap withAND existsfilter, consistent with the existingFALSEbranch behavior.Related Issues
Resolves #5165
Check List
-s)spotlessCheckpassed